Classification Algorithm Sensitivity to Training Data with Non Representative Attribute Noise

نویسندگان

  • Michael V. Mannino
  • Yanjuan Yang
  • Young Ryu
چکیده

We present an empirical comparison of major classification algorithms when training data contains attribute noise levels not representative of field data. Although conventional wisdom indicates that training data should contain noise representative of field data, it can be difficult to ensure representative noise levels. To study classification algorithm sensitivity, we develop an innovative experimental design using noise situation (under or over representation of training noise), algorithm, noise level, and training set size as factors. We consider situations of uniform attribute noise levels on all attributes, variable noise levels, and noise levels assigned by attribute importance. Our results contradict conventional wisdom indicating that investments to achieve representative noise levels may not be worthwhile. In general, over representative training noise should be avoided while under representative training noise is less of a concern. However, the interactions among algorithm, noise level, and training set size indicate that these general results may not apply to particular practice situations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Use of Bad Training Data for Better Predictions

We show how randomly scrambling the output classes of various fractions of the training data may be used to improve predictive accuracy of a classification algorithm. We present a method for calculating the "noise sensitivity signature" of a learning algorithm which is based on scrambling the output classes. This signature can be used to indicate a good match between the complexity of the class...

متن کامل

Detection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms

acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...

متن کامل

Fault location and classification in non-homogeneous transmission line utilizing breaker transients

In this paper, a single-ended fault location method is presented based on a circuit breaker operation using the frequencies of traveling waves. The proposed method receives the required data from voltage traveling waves with the aid of Fast Fourier Transform (FFT) and Wavelet Transform. Then, the Artificial Neural Network (ANN) identifies fault type and determines its location. In order to eval...

متن کامل

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007